Turbofan engines are gas turbine engines often use in aircraft propulsion. In this document, we analyze data relative to such engines. The aim is to model and predict the Remaining Useful Life (RUL) as accurately as possible. In a real-life setup, such a model can be used for:
The diagram below summarizes the structure of a turbofan engine:

The data set consists of multiple multivariate time series. Each time series is from a different engine, i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation are considered normal, i.e., it is not considered a fault condition. Three operational settings have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.
The engine is operating normally at the start of each time series, and develops a fault at some point during the series: the fault grows in magnitude until system failure.
The general modeling outline is described in the diagram below and is as follows:

In this notebook we model the engine failures from two different points of view, trying to answer the following questions:
How many cycles are left until the engine fails?
In this case we deal with regression models
Will the engine fail in less than 10 cycles? What is the probability?
This case falls into the category of classification modeling
We use existing data science libraries that provide reliability and speed, combined with a custom-made library that focuses on the specific problematic of predictive maintenance.
Now that the data is loaded, we do the following pre-processing:
After the data cleanig we visualize the distribution of the remaining features just to make sure nothings stands outs (outliers and unexpected behaviors)
On the left plot, we observe that some features are hihgly correlated. On the right plot we confirm they have been remove
Next, let's plot the distribution of the sensor data.
It seems like some values are constant. We could have guessed this by looking at the raw table. We clean those constant columns too.
The table above is clean and reasy to be used. Let's visualize some of the time series separately. Onthe X axis, we show the Remaining Useful Life (RUL). As we approach RUL=0 it means the engine is about to fail.
On the graph below, select the variable to plot in the dropdown menu. Variable PhysCoreSpeed seems quite relevant, as its values diverge as the RUL decreases.
In this section we focus on developing a model that answers the question to how many cycles are left until the engine fails. From the machine learning point of view, this means we train a regression model.
We perform a comparison of 3 different models with several combinations of hyperparameters and automatically select the best one:
When selecting a model, we respect the principles of cross-validation to avoid overfitting.
Below we show the top 5 models.
The "test" error is what we can expect to see on new unseen data.
Below we graph the actual RUL versus the predictions. On the left scatter plot, ideally, they should be equal (i.e. represented by the diagonal line). Generally speaking, the point of clouds looks as expected in a well-fitted model: it is proportional to the magnitude of the remaining useful life and there is not any outliers.
On the right plot we show the actual vs predicted RUL for a selecion of couple of engines. Note that for 2 of those engines (39 and 71), the model predicts quite well the RUL. For engine 96, the model underestimates the RUL while for engine 39 the model overestimated the RUL.
Next, we analyze the trained model and infer how each of the sensors contribute to the decrease of RUL. Conclusions are stated at the bottoom.
From the SHAP plot above we conclude that:
StaticHPCOutletPres: higer values of pressure indicate lower RUL.PysCoreSpeed and CorrCoreSpeed)Similar conclusions are drawn from the Partial Dependence PLots (PDP) below. PDP plots show the expected change in RUL versus each variable taken independently. In other words, it shows the marginal effect of each of the variables or sensors. Note that on the y axis we show the change of RUL.
In this section, we focus on the case were an engine operator needs a real-time alert system that raises al alarm when the engine is likely to fail in less than 10 cycles. In other words, we answer the following question:
Will the engine fail in less than 10 cycles? With what probability?
We utilise the cleaned dataset that has been discussed aobve, but change the way of modeling the target. Instead of modeling "number of cycles", we model the probability that the engine fails within 10 cycles. Because we are dealing with binary outcomes (failure/no failure), we model the target as a binary classification problem.
The approach we follow is very similar as above, except that now we use classification models:
We train the 3 types of models with different combination of hyperparameters, respecting the principles of cross-validation to avoid overfitting. The best modle is chosen according to the f1-score.
Below we show the top 5 best models. Again, in this case, tree-based model outperform neural networks and logstic regression models, both in accuracy and in training times.
Below we graph the predicted probability of failure given by the model versus the actual remaining useful life, for one of the engines. The model outputs reasonable probabilities on most occasions. Its confidence grows as the failure time approaches.
Classification models return a probabbility of failure. If we define a threshold, we can translate this probability to a binary variable: 1 indicates failure, 0 indicates no failure. Below, we select a few thresholds and compare how many failures are predicted correctly (true positives) versus how many false positives we observe. A false positives means we predict "failure" but in fact, it was not.
There is a certain trade-off between true and false positives. Depending on the cost of a false positive and the benefit from true positives we can establish a theshold to be used in practice.
It is interesting to look at the ratios too and not just as the absolute values. We define the following metrics which are universal for all classification problems. The reader is referred to this article for further information:
Looking at the ROC-Precision graph below is another way of choose an optimal threshold based on the business needs. The table shows the same data as the graph.
If we select a threshold of 0.5, we can expect 0.8% of false positives, and more than 80% in precision and recall. This seems reasonable from a business-perspective. Below we show the confusion matrix for such selected threshold.
In this notebook we deal with data relative to turbofan engines. We have pre-processed a dataset and visualized it to spot outliers and unexpected behaviors. Once it was clean and ready to be ingested by machine learning models, we trained two different types of models:
During the modeling phase, we tested 3 different types of models with numerous hyperparameter combinations, always respecting the principles of cross validation and selecting the best one objectively based on relevant metrics.
From the regression model, we concluded that the number of cycles since the engine started is the most relevant factor when predicting the remaining useful life. The next most important variable is StaticHPCOutletPres, so further analysis together with turbofan operators should be conducted to understand the physical implication of this variable. We also concluded that the operational characteristics of the engines are not relevant features.
When predicting if the engine wil fail in less than 10 cycles, the model achieves precisions of 80-90% with a false positive rate of 0.1-0-8%. A real-time alert system could be implemented using the developed methodology.
From the data and modeling point of view, this problem could be improved in two ways:
On a practical level, it would be interesting to answer further business questions. For example: